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A Bayesian Discriminating Features Method for Face Detection 

Chengjun Liu * 

Abstract — This paper presents a novel Bayesian Discriminating Features (BDF) method for mul- 
tiple frontal face detection. The BDF method, which is trained on images from only one database 
yet works on test images from diverse sources, displays robust generalization performance. The 
novelty of this paper comes from the integration of the discriminating feature analysis of the input 
image; theKStatistical modeling of face and nonface classes, and the Bayes classifier for multiple 
frontal face detection First, feature analysis derives a discriminating feature vector by combining 
the input image, its 1-D Hair wavelet representation, and its amplitude projection While the Han- 
wavelets produce an effective representation for object deteetien, the^splttude projections capture 
the vertical symmetric distributions and the horizontal characteristics of human face images. Sec- 
ond, statistical modeling estimates the conditional probability density functions, or PDFs, of the face 
and nonface classes, respectively. While the face class is usually modeled as a multivariate normal 
distribution, the nonface class is much more difficult to model due to the fact that it includes "the 
rest of the world". The estimation of such a broad category is, in practice, intractable. However, 
one can still derive a subset of the nonfaces that lie closest to the face class, and then model this 
particular subset as a multivariate normal distribution. Finally, the Bayes classifier applies the es- 
timated conditional PDFs to detect multiple frontal faces in an image. Experimental results using 
887 images (containing a total of 1,034 faces) from diverse image sources show the feasibility of 
the BDF method. In particular, the novel BDF method achieves 98.5% face detection accuracy with 
one false detection. 

Index Terms — Bayes classifier, Bayesian Discriminating Features (BDF), discriminating feature 

analysis, face detection, statistical modeling, support nonfaces 
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E-mail: liu@cs.njit.edu. 



BEST AVAILABLE COPY 



1 Introduction 

Among the most challenging tasks for visual form analysis and object recognition are understand- 
ing how people process and recognize each other's face, and the development of corresponding com- 
putational models for automated face recognition [8], [10]. An automated face recognition system 
includes several related face processing tasks, such as detection of a pattern as a face, face track- 
ing in a video sequence, face verification, and face recognition. Face detection generally learns the 
statistical models of the face and nonface images, and then applies a two-class classification rule to 
discriminate between face and nonface patterns. Face tracking predicts the motion of faces in a se- 
quence of images based on their previous trajectories and estimates the current and future positions 
of those faces. While face verification is mainly concerned with authenticating a claimed identity 
posed by a person, face recognition focuses on recognizing the identity of a person from a database 
of known individuals. 

This paper presents a Bayesian Discriminating Features (BDF) method for multiple frontal face 
detection by integrating feature analysis, modeling, and the Bayes classifier. The main contributions 
of the paper come from (i) discriminating feature analysis of the input image, (ii) statistical modeling 
of face and nonface classes, and (iii) the application of the Bayes classifier for multiple frontal face 
detection. 

First, the discriminating feature analysis combines the input image, its 1-D Harr wavelet represen- 
tation, and its amplitude projections. Recent research has shown that the 2-D Harr wavelet represen- 
tation is effective for human face and pedestrian detection [14]. For efficiency considerations, this 
paper incorporates the 1-D Harr wavelet representation to define the discriminating features. The 
amplitude projections, namely the column and row projections, capture the vertical symmetric dis- 
tributions and the horizontal characteristics of human face images. By combining the input image, 
its 1-D Harr wavelet representation, and its amplitude projections, the new feature vector enhances 
its discriminating power for face detection. 

Second, statistical modeling of face and nonface classes essentially estimates the conditional prob- 
ability density functions, or PDFs, of the two classes. While the face class is usually modeled as a 
multivariate normal distribution, the nonface class is much more difficult to model due to the fact 

2 



that it includes "the rest of the world". The estimation of such a broad category is, in practice, 
intractable. However, one can still derive a subset of the nonfaces that lie closest to the face class, 
and then model this particular subset of nonfaces as a multivariate normal distribution. The idea of 
using a subset of nonfaces to design the face detection algorithm is motivated by the recent statistical 
learning system, support vector machines, or SVMs. In SVM, only the support vectors, the patterns 
that lie close to the maximal margin hyperplane, are involved in designing the system. Thus, in anal- 
ogy to SVMs, the BDF method first locates the "support nonfaces" and then models this particular 
subset of nonfaces as a multivariate normal distribution. 

Finally, the BDF method applies the Bayes classifier for multiple frontal face detection. The Bayes 
classifier yields the minimum error when the underlying PDFs of the face and nonface classes are 
known. This error, called the Bayes error, is the optimal measure for feature effectiveness when 
classification is of concern, since it is a measure of class separability T3]. 

The BDF method is trained using 600 FERET face images (Batch 15) [16] and 9 natural images. 
Experimental results using 887 images (containing a total of 1,034 faces) from diverse image sources 
show the feasibility of the BDF method. In particular, the novel BDF method achieves 98.5% face 
detection accuracy with one false detection, and compares favorably against the state-of-the-art face 
detection algorithms, such as the Schneiderman-Kanade method [21], [22]. 

The novelty of this paper thus comes from: (i) the discriminating feature analysis of the input 
image, its 1-D Hair wavelet representation, and its amplitude projections; (ii) statistical modeling 
of the face class and reducing the dimensionality of the feature vector to a very small number, M, 
which is 10 in our experiments; (iii) nonface class modeling based on the concept of SVM, which 
models only a small subset of the nonfaces that lie closest to the face class; (Note that in general the 
nonface class includes "the rest of the world", which makes the estimation practically intractable. 
The introduction of the "support nonfaces" in this paper makes the nonface class modeling tractable, 
(iv) the application of the Bayes classifier with a modified decision rule for multiple frontal face 
detection; (Note that the modified decision rule, Eq. 25, introduces a control parameter, 0, which 
eliminates the nonfaces that are not close to the face class. And only those subimages that are close 
enough to the face class are passed to the Bayes decision rule. This modified decision rule thus 
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validates the nonface class modeling, which models only those nonfaces that lie closest to the face 
class rather than "the rest of the world".) (v) the development of the single response criterion and 
the early exclusion criterion for computational efficiency; (vi) the comprehensive assessments of the 
BDF method for face detection by applying images from diverse image sources, and the comparative 
assessments of the BDF method against the state-of-the-art face detection algorithms, such as the 
Schneiderman-Kanade method [21], [22]. 

2 Background 

Face detection is the first stage of an automated face recognition system, since a face has to be 
located before it is recognized. Earlier efforts have been focused on correlation or template match- 
ing, matched filtering, sub-space methods, deformable templates, etc. [15], [28]. For comprehensive 
surveys of these early methods, see [23], [1], and [20]. Recent approaches emphasize on data-driven 
learning-based techniques, such as the statistical modeling methods [12], [24], [21], [22], [26], [25], 
the neural network-based learning methods [18], [19], [24], the statistical learning theory and SVM 
based methods [4], [13], [6], the Markov random field based methods [2], [17], and the color-based 
face detection [7]. For recent surveys, see [5], [27]. 

The statistical methods usually start with the estimation of the distributions of the face and non- 
face patterns, and then apply a pattern classifier or a face detector to search over a range of scales and 
locations for possible human faces. The neural network-based methods, however, learn to discrimi- 
nate the implicit distributions of the face and nonface patterns by means of training samples and the 
network structure, without involving an explicit estimation procedure. Moghaddam and Pentland 
[12] applied unsupervised learning to estimate the density in a high-dimensional eigenspace and 
derived a maximum likelihood method for single face detection. Rather than using PCA for dimen- 
sionality reduction, they implemented the eigenspace decomposition as an integral part of estimating 
the conditional PDF in the original high-dimensional image space. Face detection is then carried out 
by computing multiscale saliency maps based on the maximum likelihood formulation. Sung and 
Poggio [24] presented an example-based learning method by means of modeling the distributions 
of face and nonface patterns. To cope with the variability of face images, they empirically chose 
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six Gaussian clusters to model the distributions for face and nonface patterns, respectively. The 
density functions of the distributions are then fed to a multiple layer perceptron for face detection. 
Schneiderman and Kanade [21] proposed a face detector based on the estimation of the posterior 
probability function, which captures the joint statistics of local appearance and position as well as 
the statistics of local appearance in the visual world. To detect side views of a face, profile images 
were added to the training set to incorporate such statistics [22]. Rowley et al [18] developed a neu- 
ral network-based upright, frontal face detection system, which applies a retinally connected neural 
network to examine small windows of an image and decide whether each window contains a face. 
The face detector, which was trained using a large number of face and nonface examples, contains a 
set of neural network-based filters and an arbitrator which merges detections from individual filters 
and eliminates overlapping detections. In order to detect faces at any degree of rotation in the image 
plane, the system was extended to incorporate a separate router network, which determines the ori- 
entation of the face pattern. The pattern is then derotated back to the upright position, which can be 
processed by the early developed system [19]. 

3 Bayesian Discriminating Features Method for Face Detection 

The Bayesian Discriminating Features (BDF) method, which displays robust generalization per- 
formance, works by integrating the discriminating feature analysis of the input image, the statistical 
modeling of face and nonface classes, and the Bayes classifier for multiple frontal face detection. 
This section details these major components of the BDF method. 

3-1 Discriminating Feature Analysis 

The discriminating feature analysis derives a new feature vector with enhanced discriminating 
power for face detection, by combining the input image, its 1-D Hqrr wavelet representation, and 
its amplitude projections. While the Harr wavelet representation has been shown effective for hu- 
man face and pedestrian detection [14], the amplitude projections are able to capture the vertical 
symmetric distributions and the horizontal characteristics of human face images. 

Let j) € R mxn represent an input image (e.g. training images for face and nonface classes, 
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or subimages of test images), and X € R mn be the vector formed by concatenating the rows (or 
columns) of J(i, j). The 1-D Harr representation of yields two images, h(i,j) € R( m - 1 > xn 
and I v [i,j) € R mx ("- 1 ), corresponding to the horizontal and vertical difference images, respec- 
tively. 

i) = /(t + 1, j) - /(*, i) l<»<m,l<i<n (1) 
h(i, j) = Hh 3 + 1) " Hi, J) 1 < i < m, 1 < j < n (2) 

Let Xh € R( m_1 ) n and X„ € R^" -1 ) be the vectors formed by concatenating the rows (or columns) 
of Jfc(t, j) and I„(z, j), respectively. 

The amplitude projections of along its rows and columns form the horizontal (row) and 
vertical (column) projections, X r e IP" and X c € R", respectively. 

X r (i) = E / ( i 'j) l<i<m (3) 

XcO') = E/(i,i) l<i<n (4) 

1 = 1 

Before forming a new feature vector, the vectors X, Xh, X v , X r , and X c are normalized by 
subtracting the means of their components and dividing by their standard deviations, respectively. 
Let X, X*. Xr, and X c be the normalized vectors. A new feature vector Y 6 R N is defined as 
the concatenation of the normalized vectors: 

Y=(x* X£ Xt X* X*)* (5) 

where t is the transpose operator, and N = 3mn is the dimensionality of the feature vector Y. 
Finally, the normalized vector of Y defines the discriminating feature vector, Y € R N , which is the 
feature vector for the multiple frontal face detection system, and which combines the input image, its 
1-D Harr wavelet representation, and its amplitude projections for enhanced discriminating power: 

(6) 

a 

where /x and a are the mean and the standard deviation of the components of Y, respectively. 
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3.2 Statistical Modeling of Face and Nonface Classes 

The main objective of statistical modeling of face and nonface classes is to estimate the condi- 
tional probability density functions, or PDFs, of these two classes, respectively. While the face 
class contains only faces, the nonface class encompasses all the other objects, i.e. "the rest of the 
world". Even though it is reasonable to assume that the face class has a multivariate normal 
distribution, it is pretty awkward to make the same assumption about the nonface class. The 
DBF method, however, derives a subset of the nonfaces that lie closest to the face class, and then 
models this particular subset of nonfaces as a multivariate normal distribution. The choosing of 
a subset of nonfaces for the BDF method resembles the idea of choosing support vectors for the 
design of support vector machines. In fact, the support vectors are the samples that lie closest to 
the decision hyperplane of a SVM, and are therefore the most important data for the 
determination of the optimum location of the decision hyperplane. The same idea is applied here 
to design the optimal decision surface for face detection by choosing the "support nonfaces" that 
lie closest to the face class, i.e. closest to the decision surface between the face and nonface 
classes. 

3.2.1 Face Class Modeling 

The conditional density function of the face class, Uf, is modeled as a multivariate normal distri- 
bution: 

»( Y M = p^JF^B^ " M '>' S ' I(Y " M '»} <7) 

where M/ € R N and S/ € R NxN are the mean and the covariance matrix of face class w/, respec- 
tively. Take the natural logarithm on both sides, we have 

ln\p(Y\u f )) = -\ {(Y - M/)*Ej l (Y - M,) + Nln(2ir) + In|S/|} (8) 
The covariance matrix, E/, can be factorized into the following form using the principal compo- 
nent analysis, or PC A [9]: 

£/ = $ f A f & f with = = Itf, A/ = diag {X u A 2 , . . . , X N } (9) 

where € R NxN is an orthogonal eigenvector matrix, A/ £ R NxN a diagonal eigenvalue matrix 
with diagonal elements (the eigenvalues) in decreasing order (A x > A 2 >- • ->An), and 1^ G R N * N 

1 



an identity matrix. An important property of PCA is its optimal signal reconstruction in the sense 
of minimum mean-square error when only a subset of principal components is used to represent the 
original signal [11]. The principal components are defined by the following vector, ZgR n : 

Z = **(Y-M,) (10) 

It then follows from Eqs. 8, 9, and 10 that 

ln\p(Y\u> f )] = -1 {Z'A^Z + Nln(2ir) + ln|A,|} (11) 

Note that the components of Z are the principal components. Applying the optimal signal recon- 
struction property of PCA, we use only the first M (M < N) principal components to estimate 
the conditional density function. We further adopt a model by Moghaddam and Pentland [12] that 
estimates the remaining N - M eigenvalues, Aa*+i, <W +2, ■ ■ • , A^, by the average of those values: 

P=N±m£** (12) 

Note that from Eq. 10, we have ||Z|| 2 = ||Y - M/|| 2 , where || • || denotes the norm operator. This 
result shows that the PCA transformation, which is an orthonormal transformation, does not change 
norm. Now, it follows from Eqs. 1 1 and 12 that 

ln\p{Y\w f )] = -||x;^+ H Y - M /ll^-gi«* + i n ^ Aj j + {N _ M)lnp + Nln{27r) | 

(13) 

where ^'s are the components of Z defined by Eq. 10. Eq. 13 states that the conditional density 
function of face class can be estimated using the first M principal components, the input image, the 
mean face, and the eigenvalues of the face class. 

3.2.2 Nonface Class Modeling 

The nonface class modeling starts with the generation of nonface samples by applying Eq. 13 to 
natural images that do not contain any human faces at all. Those subimages of the natural scene that 
lie closest to the face class are chosen as training samples for the estimation of the conditional density 



function of the nonface class, u>„, which is also modeled as a multivariate normal distribution: 

P(YK) = p^p^ O - M.)'S„-(Y - Mn)| (14) 

where M„ 6 R* and S„ € R*** are the mean and the covariance matrix of nonface class u>„, 
respectively. 

Factorize the covariance matrix, £„, using PCA [9]: 

S„ = *„An^ with * n *i = K*n = lN,*n = diag{\f\\£\...,\W} (15) 

where #„ 6 R"*" is an orthogonal eigenvector matrix, A n 6 R"*" a diagonal eigenvalue 
matrix with diagonal elements (the eigenvalues) in decreasing order (A[ n) >A^ n) >- • -^A^), and 
I N G R N * N an identity matrix. The principal component vector, U € R N , is defined as follows: 

U = ** (Y - M n ) (16) 

Estimate the remaining N-M eigenvalues, A^ +1 , \$ +2 , • • • , A^, by the average of those values: 

The conditional density function of the nonface class can be estimated as follows: 
Zn[p(Yk)] = -1 {£) ^ + ll Y - M "» 2 g -^^ 2 + '"(ft Aj n) ) + (" " *0'« + 

where u»'s are the components of U defined by Eq. 16. Eq. 18 states that the conditional density 
function of nonface class can be estimated using the first M principal components, the input image, 
the mean nonface, and the eigenvalues of the nonface class. 

3.3 The Bayesian Classifier for Multiple Frontal Eace Detection 

After modeling the conditional PDFs of the face and nonface classes, the BDF method applies the 
Bayes classifier for multiple frontal face detection, since the Bayes classifier yields the minimum 
error when the underlying PDFs are known. This error, called the Bayes error, is the optimal measure 
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for feature effectiveness When classification is of concern, since it is a measure of class separability 

Let Y € R w be the discriminating feature vector constructed from an input pattern, i.e. a subim- 
age of some test image (see Sect. 3.1). Let the a posteriori probabilities of face class (w,) and 
nonface class (w„) given Y be P(w/|Y) and P(w„|Y), respectively. The pattern is classified to the 
face class or the nonface class according to the Bayes decision rule for minimum error [3]: 



{Uj if 
w„ oi 



v c . { ifP( W/ |Y)>P(a,„|Y) (19) 
otherwise 



Note that the Bayes decision rule optimizes the class separability in the sense of the Bayes error, 
hence should yield the best performance on face detection. 

The a posteriori probabilities, P(w/|Y) and P(w„|Y), can be computed from the conditional 
PDFs as defined in Sects. 3.2.1 and 3.2.2 using the Bayes theorem: 

w f( y , p^n.Su^. (20, 

where P{u)f) and P(u n ) are the a priori probabilities of face class a// and nonface class u n , respec- 
tively, andp(Y) is the mixture density function. 
From Eqs. 13, 18, and 20, the Bayes decision rule for face detection is then defined as follows: 



where <5/, 5 n > and r are as follows: 



f Uf if 6 f + r<6 n (2i) 
{ u n otherwise 



S/ m | y + ijv - Mf f - z* * + ,„ (ft ^ +(N _ M)lnp m 

■ "' (24) 

6/ and 6 n can be calculated from the input pattern Y, the face class parameters (the mean face, the 
first M eigenvectors, and the eigenvalues), and the nonface class parameters (the mean nonface, the 
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first M eigenvectors, and the eigenvalues), r is a constant which functions as a control parameter — 
the larger the value is the fewer the false detections are. To further control the false detection rate, 
the BDF method introduces another control parameter 0. to the lace detection system, such that 

' u) f if (5 f < 6) and (6 f + r< 6 n ) 
u n otherwise 

The control parameters, r and 0, are empirically chosen for the face detection system. 
4 Experiments 

The Bayesian Discriminating Features (BDF) method integrates feature analysis, statistical mod- 
eling, and the Bayes classifier for multiple frontal face detection. The training data for the BDF 
method consist of 600 FERET frontal face images from Batcn 15 116] and 9 natural images. The 
face class thus contains 1,200 face samples for training after including the mirror images of the 
FERET data, and the nonface class consists of 4,500 nonface samples, which are generated by 
choosing the subimages that lie closest to the face class from the 9 natural images. 

Three testing data sets, SET1, SET2, and SET3, are created to evaluate the face detection perfor- 
mance of the BDF method. SET1, consisting of all the frontal face images of the Batches 12, 13, 
and 14 from the FERET database [16], contains mainly head or head and shoulder pictures as shown 
in Fig. 3. SET2, consisting of all those frontal face images from the FERET Batch 2, contains upper 
body pictures as shown in Fig. 4. And SET3 consists of images chosen from the MTT-CMU test 
sets [18] that contain frontal faces. Table 1 shows the configurations of these test sets. In particular, 
SET1 and SET2 consist of 511 and 296 images, respectively. Note that each image in SET1 and 
SET2 has only one face in it. 

SET3 1 , chosen from the MIT-CMU test sets [18], consists of 80 images containing a total of 227 

faces. As the BDF method addresses detection of frontal and real human faces, the images that 

l The 80 images (from website: http://vasc.rixmu.edu/IUS/eyesMsrl7/har/harl/usri 
are listed as follows: albert.gif, Argentina. gif, audreyl.gif, audrey2.gif, audrybtl.gif, baseball.gif, bksomels.gif, 
brian.gif, bwolen.gif, cfb.gif, churchill-downs.gif, class57.gif, cluttered-tahoe.gif, cnnl085.gif, cnnll60.gif, cnnl260.gif, 
cnnl714.gif, cnn2020.gif, cnn2221.gif, cnn2600.gif, crimson.gif, ds9.gif, ew-courtney-david.gif t ew-friends.gif, 
fleetwood-mac.gif, frisbee.gif, Germany.gif, giant-panda. gif, gigLgif, gpripe.gif, harvard.gif, hendrix2.gif, henry.gif, 
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Table 1. Testing data sets: SET1. SET2, and SET3, and testing performance. 



data 


sources 


images 


faces 


detected 


false detections 


SET1 


FERET Batches 12, 13, and 14 


511 


511 


507 


0 


SET2 


FERET Batch 2 


296 


296 


290 


0 


SET3 


MIT-CMU Test Sets 


80 


227 


221 


1 


Total 




887 


1,034 


1,018 


1 



contain large pose-angled face, line-drawn face, poker face, masked face, or Cartoon face, are not 
included in this set. Note that the testing data are more diverse than the training data, which consist 
of images from only one database. Experimental results, however, show that the BDF method, 
which is trained on a simple image set yet works on much more complex images, displays robust 
generalization performance. 

4.1 Statistical Learning of the BDF Method 

The statistical modeling of the face and the nonface classes requires the estimation of the pa- 
rameters of these two classes from the training images. The face class parameters are computed as 
follows: 

(i) Normalize the 600 FERET images to a spatial resolution of 16 x 16, which is the standard 
resolution used in this paper for multiple frontal face detection. Fig. 1 (a) shows some examples of 
the training faces that have been normalized to the standard resolution. Note that the normalization 
is based on the fixed eye locations and interocular distance. 

(ii) Add the mirror images of the 600 FERET faces to the face training set and increase the number 
john.coltrane.gil kaari-stef.gif, kaaril.gif, kaari2.gif, karen-and-rob.gif > knex0.gif, kymberly.gif, lacrosse.gif, lar- 
roquette.gif, madaboutyoiLgif, married.gif, me.gif, mom-baby.gif, mona-lisa.gif t nataliel.gif, nens.gif, oksanal.gif, 
pittsburgh-parkgif, police.gif, sarah4.gif, sarahdiveJ2.gif, seinfeldgif, shumeet.gif, soccer.gif, speed.gif, tahoe-and- 
rickgif, tammy.gif, tommyrw.gif, tori-crucify.gif, tori-entweekly.gif, tori-live3.gif, torrance.gif, tp-reza-girosLgif, tp.gif, 
tree-roots.gif, trek-trio. gif, trekcolr.gif, tress-photo-2.gif, tress-photo.gif, u2-cover.gif, uprooted-tree.gif, voyager2.gif, 
wallgif, mndow.gif, wxm,gif t yellow-pages.gif, ysato.gif. 
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of the training samples to 1,200. 

(iii) Incorporate the 1-D Harr wavelet representation and the amplitude projections into the face 
images and derive the discriminating feature vectors as detailed in Sect. 3.1. 

(iv) Derive the face class parameters: the mean face, the face class eigenvectors and eigenvalues. 
Fig. 2 (a) shows the mean face, its 1-D Harr wavelet representation, and its amplitude projections. 
The first image is the mean face. The second and the third images are the horizontal and vertical 
difference images of the mean face, respectively, which correspond to the 1-D Harr wavelet repre- 
sentation. The last two bar graphs draw the horizontal (row) and vertical (column) projections of 
the mean face, which correspond to the amplitude projections. From the figure, one can see that 
the amplitude projections are able to capture the vertical symmetric distributions and the horizontal 
characteristics of human face images. 

(v) The face class parameters also include M, the number of principal components used to model 
the conditional PDF of face class. A good choice of M should balance both the face detection perfor- 
mance and the computational complexity. M is empirically chosen to equal 10 for the experiments 
in this paper. 

The learning of the nonface class parameters starts with the generation of nonface samples from 
the 9 natural images that do not contain any human faces at all. Fig. 1 (b) shows an example natural 
image, which is a natural scene image. The nonface images, chosen from the subimages of these 
9 natural images, have the standard spatial resolution of 16 x 16 and lie closest to the face class 
whose parameters have just been computed and whose conditional PDF is specified by Eq. 13. In 
particular, 4,500 nonface samples are generated from the 9 natural images. Fig. 2 (b) shows the mean 
nonface, its 1-D Harr wavelet representation, and its amplitude projections. Note that the images 
and projections in Fig. 2 (b) resemble their counterparts in Fig. 2 (a) due to the fact that the nonface 
samples lie close to the face class. After the generation of the nonface samples, the nonface class 
parameters can be calculated in the same way the face class parameters are computed. 

Finally, the BDF method has to set the values for the two control parameters, r and 0, whose 
function is to control the false detection rate. These two control parameters are empirically chosen 
for the BDF face detection system, and are set to equal 300 for r and 500 for 0, respectively. 
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4.2 Ttesting Performance of the BDF Method 

The BDF method is applied to detect frontal faces from three testing data sets: SET1, SET2, 
and SET3. SET1 and SET2 are from the FERET database [16], and the major differences between 
these two sets come from (i) SET1 contains mainly head or head and shoulder pictures while SET2 
contains upper body pictures; and (ii) SET2 consists of more face images with glasses, and some 
glasses have bright reflections. Table 1 shows the detection performance of the BDF method when 
applied to SET1 and SET2. In particular, the BDF method successfully detects 507 faces from the 
51 1 images (each image contains only one face) without any false detection. Fig. 3 shows examples 
of the detected faces from SET1, where a square indicates a face region successfully detected. Note 
that the resolution of the images is 256 x 384, and the faces are detected at different scales. The 
BDF method again successfully detects 290 faces from the 296 images with no false detection when 
it is applied to SET2. Fig. 4 shows examples of the detected faces from SET2, which contains face 
uaages-witl} glasses having bright reflections. 

Up to now, the training and testing face data are from the FERET dafcbase: the training face data 
are from the Batch 15, and the testing data are from tfte -Batches 12, 13, and 14 for SET1, and the 
Batch 2 for SET2, respectively. To test the generalization performance of the BDF method, a third 
testing data set, SET3, is created from the MIT-CMU test sets [18]. SET3 consists of 80 images that 
contain a total of 227 faces. Note that the training face images for the BDF method are from only 
one database, but the test images in SET3 are from diverse sources: some of the images are from 
the World Wide Web, some are scanned from photographs and newspaper pictures, and some are 
digitized from broadcast television [18]. Some images contain many different sized faces (Fig. 5); 
some include rotated faces (Figs. 6 and 7); some have very large faces (Fig. 8) or very small faces 
(Fig. 9); yet others involve low quality face images (Fig. 10), partially occluded faces, or slightly 
pose-angled faces (Fig. 11). Experiments based on such a simple training set and such a diverse 
testing set should be able to test the generalization performance of the BDF method. 

Fig. 5 shows the results of detection of multiple frontal faces. The BDF method successfully 
detects all the face images except a face with a large pose in image (b) and a (downward) pose-angled 
face in image (d). In particular, all the 15 faces in image (a) are detected at the scales 22, 26, and 
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27, respectively. Note that the scale 22 means that the original image is resized by a ratio, ^ 
faces in image (b) are successfully detected at the scales 20 and 26, respectively. Note that one face 
with a large pose is not detected, since the BDF method is trained to detect multiple frontal faces. 
Image (c) contains 57 faces and they are detected at the scales 20, 25, and 30, respectively. Image 
(d) contains 14 faces and 13 complete faces are detected at the scale 20, with a (downward) pose- 
angled face missed. The two faces in image (e) are detected at the scales 40 and 44, respectively, 
and the three faces in image (f) are detected at a scale of 30. Image (g) contains two faces, which 
are detected at scales 33 and 55, respectively, while image (h) has two faces, which are detected at 
a scale of 30. The reason that the system misses one (downward) pose-angled face in image (d) and 
one face with a large pose in image (b) is that the system is trained to detect frontal faces, and the 
training images do not contain any pose-angled faces. 

The BDF method, trained only on the upright frontal faces, can also detect rotated faces in test 
images by means of rotating the test images to a number of predefined degrees, such as ±5°, ±10°, 
±15°, and ±20°. Fig. 6 shows the results of detection of multiple frontal faces with rotations. The 
BDF method successfully detects all the faces in the 6 test images. In particular, image (a) requires 
2 scales (26, 29) and 2 rotations (-10°, -15°) for the detection of all the faces, image (b) requires 
2 scales (36, 38) and 1 rotation (-10°), image (c) requires 2 scales (20, 23) and 2 rotations (-10°, 
20°), image (d) requires 2 scales (30, 38) and 1 rotation (-20°), image (e) requires 1 scale (25) and 
1 rotation (-10°), and image (f) requires 2 scales (35, 36) and 1 rotation (-20°). Fig. 7 shows some 
additional examples of rotated face detection using the BDF method. Images (a), (b), and (c) are 
rotated —10° for the detection of the faces, while image (d) is rotated —5° for the detection of the 
face. 

The BDF method is also tested on images that contain very large faces or very small faces. Fig. 8 
and Fig. 9 show the face detection performance on these test images, respectively. All the faces in 
Fig. 8 and Fig. 9 are successfully detected. Since the BDF method is trained on real face images, 
it does not detect a hand drawn face in Fig. 9, which shows the robustness of the BDF method in 
detection of real faces. 

The generalization performance of the BDF method is further tested using low quality face im- 
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ages, partially occluded faces, and slightly pose-angled faces. Fig. 10 and Fig. 11 show the face 
detection performance of the BDF method for the detection of these three categories of faces, re- 
spectively. All the faces in Fig. 10 and Fig. 11 are successfully detected, which again shows the 
robustness of the BDF method in real face detection. Note that the last image in Fig. 10 is rotated 5° 
for the detection of the face. 

For SET3, there are 6 faces that are not detected by the BDF method: 3 faces are posed-angled, 
1 is a baby face, 1 is a masked face, and 1 is in a low quality image. In particular, one large pose- 
angled face in Fig. 5 (b) and one (downward) pose-angled face in Fig. 5 (d) are not detected by the 
BDF method. Fig. 12 shows some other examples of missed faces and false detection: a masked 
face in (a), a baby face in (b), and a slighdy pose-angled face in (c) are not detected. Also, a false 
detection occurs in (c). The experimental results using 80 test images (containing in total 227 faces) 
from the MIT-CMU test sets show that the BDF method detects 221 out of the 227 faces in these 
images with 1 false detection. 

Table 1 summarizes the detection performance of the BDF method for the testing data sets: SET1, 
SET2, and SET3. The overall face detection performance of the BDF method using the 887 images 
containing a total of 1,034 faces is 98.5% correct face detection rate with one false detection. 

4.3 Comparative Eace Detection Performance 

Among the state-of-the-art face detection algorithms, the Schneiderman-Kanade method [21], 
[22] is publicly available at http:/Avww.vasc.rixmu.edu/cgi'bin/demos/JM This method has 

two thresholds, the frontal detection threshold and the profile detection threshold, which control the 
number of faces detected and the number of false detections — increasing these thresholds decreases 
both numbers. Table 2 shows the comparative face detection performance of the Schneiderman- 
Kanade method and the BDF method on the testing data set, SET3, which contains 80 images and 
227 faces. Note that the two numbers in the parentheses correspond to the frontal detection threshold 
and the profile detection threshold, respectively. 

Experimental results 2 show that the Schneiderman-Kanade method achieves 96.0% detection rate 
2 The experimental results of the Schneiderman-Kanade method are derived by submitting the images in SET3 to the 
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Table 2. Comparative face detection performance of the Schneiderman-Kanade method and the 
BDF method on testing data set, SET3, which contains 80 images and 227 faces. Note that the 
two numbers in the parentheses of the Schneiderman-Kanade method correspond to the frontal 
detection threshold and the profile detection threshold, respectively, which control the number of 
faces detected and the number of false detections. 



method 


faces detected 


false detections 


detection rate 


Schneiderman-Kanade (1.0, 1.0) 


218 


41 


96.0% 


Schneiderman-Kanade (2.0, 2.0) 


214 


5 


94.3% 


Schneiderman-Kanade (3.0, 3.0) 


208 


1 


91.6% 


the BDF method 


221 


1 


97.4% 



with 41 false detections when the thresholds are set to be 1.0. The detection rate decreases when 
the thresholds get larger: 94.3% detection rate with 5 false detections when the thresholds are 2.0, 
and 91.6% detection rate with 1 false detection when the thresholds are 3.0. The BDF method, 
achieving 97.4% face detection accuracy with one false detection, thus compares favorably against 
the state-of-the-art face detection algorithms, such as the Schneiderman-Kanade method [21], [22]. 

4.4 Computational Efficiency of the BDF Method 

The computational efficiency of the BDF method is mainly due to two criteria, namely, the single 
response criterion and the early exclusion criterion. The single response criterion circumvents the 
possibility of multiple responses to a singe face, while the early exclusion criterion uses a heuristic 
procedure to eliminate subimages that could not be faces. 

Fig. 13 (a) shows the idea of the single response criterion. Let the searching order of the subimages 

be from top to bottom, and then from left to right. Suppose a face is detected and a point p, the first 
face detector at http:/Avww.vasc.rixmu.edu/cgi-bin/demos/firuifa The complete face detection results for SET3 
using the BDF method are available at http://ymwxs^yiUduriiu/RESEARCH/fd/f<Lhtml. Note that the BDF method is 
a frontal face detection method, and it can not detect large pose-angled faces in an image. The Schneiderman-Kanade 
method, however, is capable of detecting both frontal and profile faces. 
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pixel (the upper left pixel) of the subimage, is used to represent this face. For simplicity, we use 
the upper left pixel to represent a 16 x 16 subimage in the following discussion. Now we want to 
search a small neighborhood of p, say, 7 x 7, in order to find among these 49 face candidates the 
one that lies closest to the face class. Note that 49 face candidates do not mean 49 faces, since some 
of the candidates may not be classified as face. But at least we know p corresponds to a face, hence 
there should be one face defined by one of these 49 pixels, and our purpose is to find the one that 
lies closest to the face class. Due to the predefined searching order, half of these neighbors have 
already been searched, the remaining unsearched neighbors are the pixels inside the area A. Note 
that each of these 24 neighbors defines a 16 x 16 subimage, which could be a face. Suppose q 
defines a face that lies closest to the face class, then all the pixels inside area B, which defines the 
half 16 x 16 neighborhood of the pixel q, should not be search again due to the non-overlapping 
assumption of human faces. Note that the non-overlapping assumption really means that we are 
interested in detecting complete faces. Fig. 13 (a) shows that once a face is detected, 470 subimages 
are excluded from further processing, and the area B is an eliminated area. As a result, the single 
response criterion improves the speed of face detection by excluding subimages in eliminated areas 
from further processing. Note that when carrying the eliminated area from one scale to another 
scale, one should shrink the size of the neighborhood by 1 or 2 pixels in order to detect closely 
adjacent or partially overlapping faces as those shown in Fig. 5 (e). 

To further improve the computational efficiency, we define a heuristic procedure that excludes 
subimages which could not be faces at all, such as some homogeneous background. As the major 
computation takes place in discriminate feature analysis and evaluation of the Bayes decision rule, 
an early exclusion of those nonface subimages would greatly improve the computational efficiency 
of the BDF face detection system. Fig. 13 (b) shows a 16 x 16 subimage with 3 labeled regions 
corresponding to the left eye area (A), the nose bridge area (B), and the right eye area (C), respec- 
tively. The idea of the exclusion criterion is based on some simple statistics. First, calculate the 
mean values, /z^, and /ic, of regions A, B, and C, respectively. Then, compute the average 
values, m A and m c , of the pixels whose intensity values are above the mean values of regions A and 
C, respectively; and compute the average value, ma, of the pixels whose intensity values are below 
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the mean value of regions B. Note that if no pixels in a region are above (or below) the mean value, 
then assign the mean value to the average value. Actually, such a region is a homogenous region, i.e. 
all the pixels have the same intensity value. Finally, the exclusion criterion states that a subimage is 
excluded from further processing if m A < K>m B or m c < /cm B , where k, 0 < k < 1, is a control 
factor. 

The main factor of the running time of the BDF face detection system is the number of subimages 
the system has to process. Currently, it takes the system an average of 1 second to process a 320 x 240 
image without any scaling (containing 68, 625 subimages in total) on a 900 MHz Sun Blade 1000 
workstation. Note that different image complexity requires different processing time. 

5 Discussion 

This paper presents a novel Bayesian Discriminating Features (BDF) method for multiple frontal 
face detection. The BDF method, which is trained on images from only one database yet works 
on test images from diverse, sources, displays robust generaUzatlOD prrfnrmrmpct Tftfehovelfy of 
tnis paper comes from the integration of the discriminating feature analysis of the input image, the 
statistical modsUs® #eejfflB nohface classes, ana the Bayes ciassilier for Hfruitipb frontal &*£ 
detection First, feature analysis derives a discriminating feature vector Dy combining tne input im- 
age, its 1-D Harr wavelet representation, and its amplitude projections. Second, statistical modeling 
estimates the conditional probability density Andtons, or PDFs, of the face and nonface classes, 
finally, the Bayes classifier applies the estimated conditional PDFs to detect multiple frontal faces 
Ik* an image. The BDF method is trained using 600 FERET face images and 9 natural images. Ex- 
perimental results using 887 images (containing a total of 1,034 faces) from diverse image sources 
Show the feasibility of the BDF method. In particular, the novel BDF method achieves 98.5% face 
detection accuracy with one false detection. 

Closely related to the BDF method is the maximum likelihood method developed by Moghad- 
dam and Pentland [12] for single face detection. In comparison, the BDF method differs from this 
maximum likelihood method in the following aspects: (i) the discriminating feature analysis, which 
integrates the input image, its 1-D Harr wavelet representation, and its amplitude projections, (ii) 
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the statistical modeling of the nonface class, (iii) the application of the Bayes classifier for multi- 
ple frontal face detection, (iv) the computational efficiency due to the design of the single response 
criterion and the early exclusion criterion, and (v) multiple frontal face detection. Note that the 
maximum likelihood method [12] does not contain nonface modeling. In analogy to support vector 
machines, the BDF method first locates the support nonfaces and then models this particular subset 
of nonfaces as a multivariate normal distribution. 

Future research will consider pose-angled face detection and detecting faces in video. One possi- 
bility is to discretize the pose space and design algorithms for face detection for each possible pose. 
The algorithms should consider among others feature analysis and statistical modeling of the differ- 
ent pose classes. Regarding detecting faces in video, one possibility is to use motion information to 
detect quickly region of interest, or ROI, from video, and then apply the detection algorithms, such 
as the BDF method introduced in this paper, to the ROI areas and locate faces. 

Acknowledgments: The author would like to thank the anonymous reviewers for their critical and 
constructive comments and suggestions. 
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Figure 2. Discriminating feature analysis of the mean face and the mean nonface. (a) The first image is the mean 
face, the second and the third images are its 1-D Haar wavelet representation, and the last two bar graphs are its 
amplitude projections, (b) The mean nonface, its 1-D Haar wavelet representation, and its amplitude projections. 
Note that the images and projections in (b) resemble their counterparts in (1) due to the fact that the nonface 
samples lie close to the face class. 
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Figure 3. Face detection exarrples from SET1. A square indicates a face region successfully detected. The 
resolution of the images is 256 X 384, and the faces are detected at different scales. 
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Figure 5. Detection of multiple frontal faces. From left to right, top to bottom, the images are labeled (a), (b), 
(c), (d), (e), (0, (g), and (h), respectively. 
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Figure 7. Detection of rotated faces. From left to right, top to bottom, the images are labeled (a), (b), (c), and 
(d), respectively. 
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Figure 8. Detection of large frontal faces. 



30 




Figure 9. Detection of small frontal faces. 
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Figure 10. Face detection in low quality images. 
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(c) 

Figure 12. Examples of missed faces and false detection. From left to right, top to bottom, the images are 
labeled (a), (b), and (c), respectively. A masked face in (a), a bay face in (b), and a slightly pose-angled fact in 
(c) are not detected. A false detection occurs in (c). 
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Figure 13. The single response criterion and the early exclusion criterion, (a) The single response, q, and the 
eliminated area, B. (b) A 16 X 16 subimage with 3 labeled regions corresponding to the left eye area (A), the 
nose bridge area (B), and the right eye area (C), respectively. 
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1 . A method of detecting faces in an image, the method comprising the 
steps of: 

Forming a discriminating feature representation of an image; 

Calculating, based upon said discriminating feature representation, a 
first posterior probability that a pattern belongs to a face class; 

Calculating, based upon said discriminating feature representation, a 
second posterior probability that a pattern belongs to a nonface class, 
and 

Classifying said feature representation as being a face if said first 
probability is greater than said second probability. 

2. The method of claim 1 wherein the face class contains fewer images 
than said nonface class. 

3. The method of claim 1 further comprising normalizing plural images and 
utilizing said normalized images for training. 
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